feat(backend): packed-quant matmul dispatch in DefaultCpuOpsBase (wor… by michalharakal · Pull Request #711 · SKaiNET-developers/SKaiNET

michalharakal · 2026-06-08T09:42:43Z

…ks on Native)

Part of #708. Makes ops.matmul(x, ops.transpose(W)) route packed-quant weights to a kernel on EVERY KMP target. Before this, the packed-quant matmul dispatch + lazy transpose lived only in DefaultCpuOpsJvm, so on Kotlin/Native/JS/WASM a packed weight fell through to matmulGeneric, which throws on Byte-packed data — packed matmul was effectively broken off-JVM.

New Q5_1/Q5_0 packed tensor-data types + TensorEncoding.Q5_0/Q5_1 (lang-core).
DefaultCpuOpsBase: chooseQuantizedMatmulHeap resolves the kernel via the commonMain KernelRegistry (scalar floor on Native/JS/WASM; Panama/FFM on JVM via the ensureKernelProviders() hook + ServiceLoader) and dispatches FP32 × packed {Q8_0,Q4_0,Q4_K,Q6_K,Q5_1,Q5_0}; lazy-transpose shape-swap branches for the four heap K/Q5 types. The JVM ops keep their MemSeg/SIMD fast paths and intercept Q4_K/Q6_K/Q8_0/Q4_0 before the base — zero JVM regression by construction; Q5_1/Q5_0 (and the whole set on non-JVM) resolve in the base.
Non-JVM platform factories (linux/apple/js/wasm/wasmWasi/android) register ScalarKernelProvider (no ServiceLoader off-JVM).

Tests: PackedMatmulDispatchTest (commonTest) runs Q4_K + Q5_1 through ctx.ops.matmul(x, transpose(W)) and matches the dequant reference — green on jvmTest AND linuxX64Test (the Native end-to-end proof). Full backend-cpu jvmTest suite passes (no regression); apiDump regenerated for lang-core + backend-cpu.

…ks on Native) Part of #708. Makes `ops.matmul(x, ops.transpose(W))` route packed-quant weights to a kernel on EVERY KMP target. Before this, the packed-quant matmul dispatch + lazy transpose lived only in DefaultCpuOpsJvm, so on Kotlin/Native/JS/WASM a packed weight fell through to matmulGeneric, which throws on Byte-packed data — packed matmul was effectively broken off-JVM. - New Q5_1/Q5_0 packed tensor-data types + TensorEncoding.Q5_0/Q5_1 (lang-core). - DefaultCpuOpsBase: `chooseQuantizedMatmulHeap` resolves the kernel via the commonMain KernelRegistry (scalar floor on Native/JS/WASM; Panama/FFM on JVM via the ensureKernelProviders() hook + ServiceLoader) and dispatches FP32 × packed {Q8_0,Q4_0,Q4_K,Q6_K,Q5_1,Q5_0}; lazy-transpose shape-swap branches for the four heap K/Q5 types. The JVM ops keep their MemSeg/SIMD fast paths and intercept Q4_K/Q6_K/Q8_0/Q4_0 before the base — zero JVM regression by construction; Q5_1/Q5_0 (and the whole set on non-JVM) resolve in the base. - Non-JVM platform factories (linux/apple/js/wasm/wasmWasi/android) register ScalarKernelProvider (no ServiceLoader off-JVM). Tests: PackedMatmulDispatchTest (commonTest) runs Q4_K + Q5_1 through ctx.ops.matmul(x, transpose(W)) and matches the dequant reference — green on jvmTest AND linuxX64Test (the Native end-to-end proof). Full backend-cpu jvmTest suite passes (no regression); apiDump regenerated for lang-core + backend-cpu. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…-quant-kernels # Conflicts: # skainet-lang/skainet-lang-core/src/commonMain/kotlin/sk/ainet/lang/tensor/data/Q5_0TensorData.kt # skainet-lang/skainet-lang-core/src/commonMain/kotlin/sk/ainet/lang/tensor/data/Q5_1TensorData.kt

github-actions · 2026-06-08T09:46:42Z

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

Operator documentation: docs/modules/operators/_generated_/
JSON schema output: operators.json

Artifacts:

Download the documentation-preview-711 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

michalharakal and others added 2 commits June 8, 2026 11:07

michalharakal merged commit 76b315e into develop Jun 8, 2026
11 checks passed

michalharakal deleted the feature/708-native-quant-kernels branch June 8, 2026 10:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backend): packed-quant matmul dispatch in DefaultCpuOpsBase (wor…#711

feat(backend): packed-quant matmul dispatch in DefaultCpuOpsBase (wor…#711
michalharakal merged 2 commits into
developfrom
feature/708-native-quant-kernels

michalharakal commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant